11 research outputs found

    Backpropagation for long sequences: beyond memory constraints with constant overheads

    Get PDF
    Naive backpropagation through time has a memory footprint that grows linearly in the sequence length, due to the need to store each state of the forward propagation. This is a problem for large networks. Strategies have been developed to trade memory for added computations, which results in a sublinear growth of memory footprint or computation overhead. In this work, we present a library that uses asynchronous storing and prefetching to move data to and from slow and cheap stor- age. The library only stores and prefetches states as frequently as possible without delaying the computation, and uses the optimal Revolve backpropagation strategy for the computations in between. The memory footprint of the backpropagation can thus be reduced to any size (e.g. to fit into DRAM), while the computational overhead is constant in the sequence length, and only depends on the ratio between compute and transfer times on a given hardware. We show in experiments that by exploiting asyncronous data transfer, our strategy is always at least as fast, and usually faster than the previously studied "optimal" strategies

    High-level python abstractions for optimal checkpointing in inversion problems

    Get PDF
    Inversion and PDE-constrained optimization problems often rely on solving the adjoint problem to calculate the gradient of the objec- tive function. This requires storing large amounts of intermediate data, setting a limit to the largest problem that might be solved with a given amount of memory available. Checkpointing is an approach that can reduce the amount of memory required by redoing parts of the computation instead of storing intermediate results. The Revolve checkpointing algorithm o ers an optimal schedule that trades computational cost for smaller memory footprints. Integrat- ing Revolve into a modern python HPC code and combining it with code generation is not straightforward. We present an API that makes checkpointing accessible from a DSL-based code generation environment along with some initial performance gures with a focus on seismic applications

    Architecture and performance of Devito, a system for automated stencil computation

    Get PDF
    Stencil computations are a key part of many high-performance computing applications, such as image processing, convolutional neural networks, and finite-difference solvers for partial differential equations. Devito is a framework capable of generating highly-optimized code given symbolic equations expressed in Python, specialized in, but not limited to, affine (stencil) codes. The lowering process -- from mathematical equations down to C++ code -- is performed by the Devito compiler through a series of intermediate representations. Several performance optimizations are introduced, including advanced common sub-expressions elimination, tiling and parallelization. Some of these are obtained through well-established stencil optimizers, integrated in the back-end of the Devito compiler. The architecture of the Devito compiler, as well as the performance optimizations that are applied when generating code, are presented. The effectiveness of such performance optimizations is demonstrated using operators drawn from seismic imaging applications

    Assessment of Olfactory Function in MAPTAssociated Neurodegenerative Disease Reveals Odor-Identification Irreproducibility as a Non-Disease-Specific, General Characteristic of Olfactory Dysfunction

    Get PDF
    Olfactory dysfunction is associated with normal aging, multiple neurodegenerative disorders, including Parkinson’s disease, Lewy body disease and Alzheimer’s disease, and other diseases such as diabetes, sleep apnea and the autoimmune disease myasthenia gravis. The wide spectrum of neurodegenerative disorders associated with olfactory dysfunction suggests different, potentially overlapping, underlying pathophysiologies. Studying olfactory dysfunction in presymptomatic carriers of mutations known to cause familial parkinsonism provides unique opportunities to understand the role of genetic factors, delineate the salient characteristics of the onset of olfactory dysfunction, and understand when it starts relative to motor and cognitive symptoms. We evaluated olfactory dysfunction in 28 carriers of two MAPT mutations (p.N279K, p.P301L), which cause frontotemporal dementia with parkinsonism, using the University of Pennsylvania Smell Identification Test. Olfactory dysfunction in carriers does not appear to be allele specific, but is strongly age-dependent and precedes symptomatic onset. Severe olfactory dysfunction, however, is not a fully penetrant trait at the time of symptom onset. Principal component analysis revealed that olfactory dysfunction is not odor-class specific, even though individual odor responses cluster kindred members according to genetic and disease status. Strikingly, carriers with incipient olfactory dysfunction show poor inter-test consistency among the sets of odors identified incorrectly in successive replicate tests, even before severe olfactory dysfunction appears. Furthermore, when 78 individuals without neurodegenerative disease and 14 individuals with sporadic Parkinson’s disease were evaluated twice at a one-year interval using the Brief Smell Identification Test, the majority also showed inconsistency in the sets of odors they identified incorrectly, independent of age and cognitive status. While these findings may reflect the limitations of these tests used and the sample sizes, olfactory dysfunction appears to be associated with the inability to identify odors reliably and consistently, not with the loss of an ability to identify specific odors. Irreproducibility in odor identification appears to be a non-disease-specific, general feature of olfactory dysfunction that is accelerated or accentuated in neurodegenerative disease. It may reflect a fundamental organizational principle of the olfactory system, which is more “error-prone” than other sensory systems

    Lossy Checkpoint Compression in Full Waveform Inversion

    No full text
    This paper proposes a new method that combines check- pointing methods with error-controlled lossy compression for large-scale high-performance Full-Waveform Inversion (FWI), an inverse problem commonly used in geophysical exploration. This combination can signif- icantly reduce data movement, allowing a reduction in run time as well as peak memory. In the Exascale computing era, frequent data transfer (e.g., memory bandwidth, PCIe bandwidth for GPUs, or network) is the performance bottleneck rather than the peak FLOPS of the processing unit. Like many other adjoint-based optimization problems, FWI is costly in terms of the number of floating-point operations, large memory foot- print during backpropagation, and data transfer overheads. Past work for adjoint methods has developed checkpointing methods that reduce the peak memory requirements during backpropagation at the cost of additional floating-point computations. Combining this traditional checkpointing with error-controlled lossy compression, we explore the three-way tradeoff between memory, precision, and time to solution. We investigate how approximation errors introduced by lossy compression of the forward solution impact the objective function gradient and final inverted solution. Empirical results from these numerical experiments indicate that high lossy-compression rates (compression factors ranging up to 100) have a relatively minor impact on convergence rates and the quality of the final solution

    Automatic differentiation for adjoint stencil loops

    No full text
    Stencil loops are a common motif in computations including convolutional neural networks, structured-mesh solvers for partial differential equations, and image processing. Stencil loops are easy to parallelise, and their fast execution is aided by compilers, libraries, and domain-specific languages. Reverse-mode automatic differentiation, also known as algorithmic differentiation, autodiff, adjoint differentiation, or back-propagation, is sometimes used to obtain gradients of programs that contain stencil loops. Unfortunately, conventional automatic differentiation results in a memory access pattern that is not stencil-like and not easily parallelisable. In this paper we present a novel combination of automatic differentiation and loop transformations that preserves the structure and memory access pattern of stencil loops, while computing fully consistent derivatives. The generated loops can be parallelised and optimised for performance in the same way and using the same tools as the original computation. We have implemented this new technique in the Python tool PerforAD, which we release with this paper along with test cases derived from seismic imaging and computational fluid dynamics applications

    TROPHY: Trust Region Optimization Using a Precision Hierarchy

    No full text
    We present an algorithm to perform trust-region-based optimization for nonlinear unconstrained problems. The method selectively uses function and gradient evaluations at different floating-point precisions to reduce the overall energy consumption, storage, and communication costs; these capabilities are increasingly important in the era of exascale computing. In particular, we are motivated by a desire to improve computational efficiency for massive climate models. We employ our method on two examples: the CUTEst test set and a large-scale data assimilation problem to recover wind fields from radar returns. Although this paper is primarily a proof of concept, we show that if implemented on appropriate hardware, the use of mixed-precision can significantly reduce the computational load compared with fixed-precision solvers.Comment: 14 pages, 2 figures, 2 table

    Optimised finite difference computation from symbolic equations

    No full text
    Domain-specific high-productivity environments are playing an increasingly important role in scientific computing due to the levels of abstraction and automation they provide. In this paper we introduce Devito, an open-source domain-specific framework for solving partial differential equations from symbolic problem definitions by the finite difference method. We highlight the generation and automated execution of highly optimized stencil code from only a few lines of high-level symbolic Python for a set of scientific equations, before exploring the use of Devito operators in seismic inversion problems
    corecore